Create dummy variables in PostgreSQL - postgresql

Is it possible to create a dummy variable when querying
For instance the query below will give me only the observations that satisfy the var1 conditions. I also want the remaining observations but with some kind of tag on it (0/1, indicator values would be sufficient)
SELECT distinct ON (id) id,var1,var2,var3
FROM table
where var2 = ANY('{blue,yellow}');
Have
+-----+------+--------+------+
| id | Var1 | Var2 | Var3 |
+-----+------+--------+------+
| 345 | 12 | Blue | 3456 |
| 345 | 12 | Red | 2134 |
| 346 | 45 | Blue | 3451 |
| 347 | 25 | yellow | 1526 |
+-----+------+--------+------+
Want
+-----+------+--------+------+--------------------+
| id | Var1 | Var2 | Var3 | Indicator variable |
+-----+------+--------+------+--------------------+
| 345 | 12 | Blue | 3456 | 1 |
| 345 | 12 | Red | 2134 | 0 |
| 346 | 45 | Blue | 3451 | 1 |
| 347 | 25 | yellow | 1526 | 1 |
+-----+------+--------+------+--------------------+

Instead of a expression in where you can use an expression in select output expressions:
=> select a, a = any('{1,2,3,5,7}') as asmallprime
from generate_series(1,10) as a;
a | asmallprime
----+-------------
1 | t
2 | t
3 | t
4 | f
5 | t
6 | f
7 | t
8 | f
9 | f
10 | f
(10 rows)

Tometzky's answer is sufficient, but if you want something more complex you can also use CASE statements.
Tometzky's example using CASE with an extra indicator
SELECT a, CASE WHEN a = any('{1,2,3,5,7}') THEN 'YES'
WHEN a = any('{4,9}') THEN 'SQUARE' ELSE 'NO' END as asmallprime
FROM generate_series(1,10) as a;

Related

pyspark piplineRDD fit to Dataframe column

Before everything i'm new guy in python and spark world.
I have homework from university but i stuck in one place.
I make clusterization from my data and now i have my clusters in PipelinedRDD
aftre this:
cluster = featurizedScaledRDD.map(lambda r: kmeansModelMllib.predict(r))
cluster = [2,1,2,0,0,0,1,2]
now now i have cluster and my dataframe dataDf i need fit my cluster like a new column to dataDf
i Have: i Need:
+---+---+---+ +---+---+---+-------+
| x | y | z | | x | y | z |cluster|
+---+---+---+ +---+---+---+-------+
| 0 | 1 | 1 | | 0 | 1 | 1 | 2 |
| 0 | 0 | 1 | | 0 | 0 | 1 | 1 |
| 0 | 8 | 0 | | 0 | 8 | 0 | 2 |
| 0 | 8 | 0 | | 0 | 8 | 0 | 0 |
| 0 | 1 | 0 | | 0 | 1 | 0 | 0 |
+---+---+---+ +---+---+---+-------+
You can add index using zipWithIndex, join, and convert back to df.
swp = lambda x: (x[1], x[0])
cluster.zipWithIndex().map(swp).join(dataDf.rdd.zipWithIndex().map(swp)) \
.values().toDF(["cluster", "point"])
In some cases it should be possible to use zip:
cluster.zip(dataDf.rdd).toDF(["cluster", "point"])
You can follow with .select("cluster", "point.*") to flatten the output.

How use grep for that complicated expressions?

+----+-------+-----+
| ID | STORE | QTY |
+----+-------+-----+
| | | |
| 9 | 101 | 18 |
| | | |
| 8 | 154 | 19 |
| | | |
| 7 | 111 | 13 |
| | | |
| 9 | 154 | 18 |
| | | |
| 8 | 101 | 19 |
| | | |
| 7 | 101 | 13 |
| | | |
| 9 | 111 | 18 |
| | | |
| 8 | 111 | 19 |
| | | |
| 7 | 154 | 14 |
+----+-------+-----+
Suppose that I have 3 stores, and I'd like to take STORE for every id which qty is the same for every store.
e.g id 9 is in 3 stores, in every store has 18 qty,
but id 7 is in stores but in only two store has equal qty (in store 111 and 101 - in 154 - id has 14 qty); how can I get that result using grep?
Do you think that is impossible to get that one in one expressions? I thought about regex but I don't know in which way I get Qty and compare to another row. In my file it looks like:
Extract the first and last columns by cut, count the number of uniq combinations, and output only those whose count is 3 (i.e. the value is the same for all three stores):
$ cut -d\| -f2,4 | sort | uniq -c | grep '^ *3 '
3 8 | 19
3 9 | 18

iReport - XY Chart: Disable connection defined through x-value

I have a subdataset for a chart in my report that looks like:
+------+---+----+
| name | x | y |
+------+---+----+
| a | 0 | 9 |
+------+---+----+
| b | 1 | 13 |
+------+---+----+
| c | 2 | 20 |
+------+---+----+
| d | 3 | 22 |
+------+---+----+
| e | 4 | 23 |
+------+---+----+
| f | 4 | 24 |
+------+---+----+
| g | 3 | 17 |
+------+---+----+
| h | 2 | 14 |
+------+---+----+
| i | 1 | 10 |
+------+---+----+
| j | 0 | 3 |
+------+---+----+
This creates me a chart that looks like: a-j-b-i-c-h-d-g-e-f (ordered by x value)
But I want that my chart is ordered by the table a-b-c-d-e-f-g-h-i-j
How can I do that with iReport?
I have found the solution:
You have to add the attribute autoSort to your xySeries in the XML code
<xyDataset>
<dataset .../>
<xySeries autoSort="false">
...
</xySeries>
</xyDataset>
(Found the solution here)

how to join to files with awk/sed/grep/bash similar to SQL JOIN

how to join to files with awk/sed/grep/bash similar to SQL JOIN?
I have a file that looks like this:
and another one that looks like this:
i've also a text version of the image above:
+----------+------------------+------+------------+----+---------------------------------------------------+---------------------------------------------------+-----+-----+-----+------+-------+-------+--------------+------------+--+--+---+---+----+--+---+---+----+------------+------------+------------+------------+
| 21548598 | DSND001906102.2 | 0107 | 001906102 | 02 | FROZEN / O.S.T. | FROZEN / O.S.T. | 001 | 024 | | | 11.49 | 13.95 | 050087295745 | 11/25/2013 | | | N | N | 30 | | 1 | E | 1 | 10/07/2013 | 02/27/2014 | 10/07/2013 | 10/07/2013 |
| 25584998 | WD1194190DVD | 0819 | 1194190 | 18 | FROZEN / (WS DOL DTS) | FROZEN / (WS DOL DTS) | 050 | 110 | | G | 21.25 | 29.99 | 786936838961 | 03/18/2014 | | | N | N | 0 | | 1 | A | 2 | 12/20/2013 | 03/13/2014 | 12/20/2013 | 12/20/2013 |
| 25812794 | WHV1000292717BR | 0526 | 1000292717 | BR | GRAVITY / (UVDC) | GRAVITY / (UVDC) | 050 | 093 | | PG13 | 29.49 | 35.99 | 883929244577 | 02/25/2014 | | | N | N | 30 | | 1 | E | 3 | 01/16/2014 | 02/11/2014 | 01/16/2014 | 01/16/2014 |
| 24475594 | SNY303251.2 | 0085 | 303251 | 02 | BEYONCE | BEYONCE | 001 | 004 | | | 14.99 | 17.97 | 888430325128 | 12/20/2013 | | | N | N | 30 | | 1 | A | 4 | 12/19/2013 | 01/02/2014 | 12/19/2013 | 12/19/2013 |
| 25812787 | WHV1000284958DVD | 0526 | 1000284958 | 18 | GRAVITY (2PC) / (UVDC SPEC 2PK) | GRAVITY (2PC) / (UVDC SPEC 2PK) | 050 | 093 | | PG13 | 21.25 | 28.98 | 883929242528 | 02/25/2014 | | | N | N | 30 | | 1 | E | 5 | 01/16/2014 | 02/11/2014 | 01/16/2014 | 01/16/2014 |
| 21425462 | PBSDMST64400DVD | E349 | 64400 | 18 | MASTERPIECE CLASSIC: DOWNTON ABBEY SEASON 4 (3PC) | MASTERPIECE CLASSIC: DOWNTON ABBEY SEASON 4 (3PC) | 050 | 095 | 094 | | 30.49 | 49.99 | 841887019705 | 01/28/2014 | | | N | N | 30 | | 1 | A | 6 | 09/06/2013 | 01/15/2014 | 09/06/2013 | 09/06/2013 |
| 25584974 | WD1194170BR | 0819 | 1194170 | BR | FROZEN (2PC) (W/DVD) / (WS AC3 DTS 2PK DIGC) | FROZEN (2PC) (W/DVD) / (WS AC3 DTS 2PK DIGC) | 050 | 110 | | G | 27.75 | 39.99 | 786936838923 | 03/18/2014 | | | N | N | 0 | | 2 | A | 7 | 12/20/2013 | 03/13/2014 | 01/15/2014 | 01/15/2014 |
| 21388262 | HBO1000394029DVD | 0203 | 1000394029 | 18 | GAME OF THRONES: SEASON 3 | GAME OF THRONES: SEASON 3 | 050 | 095 | 093 | | 47.99 | 59.98 | 883929330713 | 02/18/2014 | | | N | N | 30 | | 1 | E | 8 | 08/29/2013 | 02/28/2014 | 08/29/2013 | 08/29/2013 |
| 25688450 | WD11955700DVD | 0819 | 11955700 | 18 | THOR: THE DARK WORLD / (AC3 DOL) | THOR: THE DARK WORLD / (AC3 DOL) | 050 | 093 | | PG13 | 21.25 | 29.99 | 786936839500 | 02/25/2014 | | | N | N | 30 | | 1 | A | 9 | 12/24/2013 | 02/20/2014 | 12/24/2013 | 12/24/2013 |
| 23061316 | PRT359054DVD | 0818 | 359054 | 18 | JACKASS PRESENTS: BAD GRANDPA / (WS DUB SUB AC3) | JACKASS PRESENTS: BAD GRANDPA / (WS DUB SUB AC3) | 050 | 110 | | R | 21.75 | 29.98 | 097363590545 | 01/28/2014 | | | N | N | 30 | | 1 | E | 10 | 12/06/2013 | 03/12/2014 | 12/06/2013 | 12/06/2013 |
| 21548611 | DSND001942202.2 | 0107 | 001942202 | 02 | FROZEN / O.S.T. (BONUS CD) (DLX) | FROZEN / O.S.T. (BONUS CD) (DLX) | 001 | 024 | | | 14.09 | 19.99 | 050087299439 | 11/25/2013 | | | N | N | 30 | | 1 | E | 11 | 10/07/2013 | 02/06/2014 | 10/07/2013 | 10/07/2013 |
+----------+------------------+------+------------+----+---------------------------------------------------+---------------------------------------------------+-----+-----+-----+------+-------+-------+--------------+------------+--+--+---+---+----+--+---+---+----+------------+------------+------------+------------+
The 2nd column from the first file can be joined to the 14th column of the second file!
here's what i've been trying to do:
join <(sort awk -F"\t" '{print $14,$12}' aecprda12.tab) <(sort awk -F"\t" '{print $2,$1}' output1.csv)
but i am getting these errors:
$ join <(sort awk -F"\t" '{print $14,$12}' aecprda12.tab) <(sort awk -F"\t" '{print $2,$1}' output1.csv)
sort: unknown option -- F
Try sort --help' for more information.
sort: unknown option -- F
Try sort --help' for more information.
-700476409 [waitproc] -bash 10336 sig_send: error sending signal 20 to pid 10336, pipe handle 0x84, Win32 error 109
the output i would like would be something like this:
+-------+-------+---------------+
| 12.99 | 14.77 | 3383510002151 |
| 13.97 | 17.96 | 3383510002175 |
| 13.2 | 13 | 3383510002267 |
| 13.74 | 14.19 | 3399240165349 |
| 9.43 | 9.52 | 3399240165363 |
| 12.99 | 4.97 | 3399240165479 |
| 7.16 | 7.48 | 3399240165677 |
| 11.24 | 9.43 | 4011550620286 |
| 13.86 | 13.43 | 4260182980316 |
| 13.98 | 12.99 | 4260182980507 |
| 10.97 | 13.97 | 4260182980514 |
| 11.96 | 13.2 | 4260182980545 |
| 15.88 | 13.74 | 4260182980552 |
+-------+-------+---------------+
what am i doing wrong?
You can do all the work in join and sort
join -1 2 -2 14 -t $'\t' -o 2.12,1.1,0 \
<( sort -t $'\t' -k 2,2 output1.csv ) \
<( sort -t $'\t' -k 14,14 aecprda12.tab )
Notes:
$'\t' is a bash ANSI-C quoted string which is a tab character: neither join nor sort seem to recognize the 2-character string "\t" as a tab
-k col,col sorts the file on the specified column
join has several options to control how it works; see the join(1) man page.
sort awk -F...
is not a valid command; it means sort a file named awk but of course, like the error message says, there is no -F option to sort. The syntax you are looking for is
awk -F ... | sort
However, you might be better off doing the joining in Awk directly.
awk -F"\t" 'NR==FNR{k[$14]=$12; next}
k[$2] { print $2, $1, k[$2] }' aecprda12.tab output1.csv
I am assuming that you don't know whether every item in the first file has a corresponding item in the second file - and that you want only "matching" items. There is indeed a good way to do this in awk. Create the following script (as a text file, call it myJoin.txt):
BEGIN {
FS="\t"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
a[$2]=$1 # create one array element for each $1/$2 pair
next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
# see if the associative array element exists:
gsub(/ /,"",$14) # trim leading/ trailing spaces
if (a[$14]) { # see if the value in $14 was seen in the first file
# print out the three values you care about:
print $12 " " a[$14] " " $14
}
}
Now execute this with
awk -f myJoin.txt file1 file2
Seems to work for me...

Postgres : lag and lead with special conditions

Forgive what may be a silly question, but I'm not much of a database guru.
Here is my table :
id_data | val_no3 | id_prev | id_next
--------+---------+---------+----------
1 | | | 2
2 | 7 | |
3 | | 2 | 4
4 | 5 | |
5 | | 4 | 10
6 | | 4 | 10
7 | | 4 | 10
8 | | 4 | 10
9 | | 4 | 10
10 | 8 | 4 |
In the table below :
id_prev is the value of the id_data which precedes when val_no3 is null
id_next is the value of the id_data which folow when val_no3 is null
And now i would like to have this one :
id_data | val_no3 | id_prev | id_next | val_prev | val_next
--------+---------+---------+----------+----------+----------
1 | | | 2 | | 7
2 | 7 | | | |
3 | | 2 | 4 | 7 | 5
4 | 5 | | | |
5 | | 4 | 10 | 5 | 8
6 | | 4 | 10 | 5 | 8
7 | | 4 | 10 | 5 | 8
8 | | 4 | 10 | 5 | 8
9 | | 4 | 10 | 5 | 8
10 | 8 | | | |
The conditions are as follows:
If val_no3 is null then : val_prev and val_next must be null
If val_no3 is not null then :
val_prev must be equal to the previous value of val_no3 (it should be null if val_no3 which precedes is null too)
val_next must be equal to the following value of val_no3 (it should be null if val_no3 which folows is null too)
I think i might have to use something with lag and lead but i don't know how to do.
I would be very grateful if you could give me your help to resolve this issue, thank you.
No need for analytic functions, just sub-selects. Something like the following (untested) should work:
select
id_data,
val_no3,
id_prev,
id_next,
(select val_no2 from b where id_data = x.id_prev) as val_prev,
(select val_no2 from b where id_data = x.id_next) as val_next
from
b x
order by
id_data;

Resources